Recently there was a most interesting discussion of Claim Extraction Templates in the Evidentia Google + Community. It was initiated by Paul Harris who had created Claim Extraction Templates in Notepad, from which he was able to paste the completed Claims into the Catalogue Claims screen of Evidentia. Paul describes his work on Claims Extraction Templates in a Screencast at http://www.screencast.com/t/gYueKj4HOHX. In the ensuing discussion, Kevin Grooms indicated that he had been working toward a similar end, but using Excel.
Extracting Claims from sources and crafting Claims statements is a slow process in Evidentia. Each information item needs to be identified and then a statement of the Claim it makes must be written and assigned to one or more Subjects. Slowing down the process of examining sources is helpful. The slower pace means that I don’t skip over details that I may not think important at the moment but which might become important later on. It seemed to me that Claim Extraction Templates could do at least three things for me: help in the examination of various type of sources in order to identify the Claims that they make, aid in drafting more consistent Claim statements, and make entering Claim statements (and assigning their respective Subjects) a little less tedious.
I decided to begin with US census records using LibreOffice Calc to develop the Claim Extraction Templates. I choose census records because I have a lot of census records to extract, each year’s information is consistent because of the forms used, and there is a lot of information to be extracted for each census record (especially with large families). I started with the 1850 US census and have just completed all the years through 1940. I am not a power spreadsheet user. This was a learn as I go project. My earlier attempts will be revised on the basis of my later learnings. For the rest of this post, I will describe the process (and results) for building a Claim Extraction Template form for the 1930 US census, and will use the record of my Great Uncle Carl Mieding as an example.
There are three different kinds of data to be accounted for on each census page: general data regarding information of the particular page (location, enumeration district, page, date); general information for the particular household (street, address, dwelling#, family#, etc.); particular information for each individual in the household (name, age, marital status, occupation, et al.). I have developed a separate spreadsheet for each census year. Each spreadsheet has the capacity to contain multiple sheets (a template, and a sheet for each of the households enumerated). I have set up each spreadsheet Template with the capacity for 10 individuals. (For households with more than 10, extra lines can be added to the data entry section of the spreadsheet and addition sections can be added Claims section.)
In the development of the Claim Extraction Templates for the first census records I worked on, I was entering the formulas directly into the spreadsheet. I began to realize that it was necessary to repeat the formulas for each individual. Unfortunately, formulas, when copied, are always adjusted to reflect the new cells. That resulted in formulas with wrong elements. I then began to save the formulas for each of the Claim statements in a Notepad text (.txt) file. I initially work on the Claims for the Head of Household. I test out the formulas in the spreadsheet but finish with them in Notepad.
Once I have completed the formulas for the Head of Household, I save them in Notepad and save a second copy under a slightly different name. I open the second copy and remove the formulas that are generic for the household (for example, in 1930 the question was asked “Is there a radio in the household?”). These formulas still point to the information related to the Head of Household (line #4 in the spreadsheet). I do a search and replace to change all references to line 4 to have them now point to line 5.
It is quite simple now to copy the formulas to the Claims Templates section of the spreadsheet. I highlight and copy the entire list of formulas for the Head of Household (each one on a single line, no extra lines between them). I click in the 1st cell where the formulas are to be entered and select “Copy.” The Text Import dialogue box will open. I simply select “OK” and all the formulas are entered in their correct form. I do the same for the next person in the household (using the second set of formulas). Because that data remains on the clipboard, I can now enter it as many times as needs to represent the total number of individuals. (I have included 10 individuals in my spreadsheets.) One other step is needed. After copying each set of formulas, I do a search and replace all in the spreadsheet (replace 5 with 6 for the 3rd person; replace 6 with 7 for the 4th person; etc.)
I also have constructed formulas for indicating the assignment of each of the Claim to the respective individual with Claim Types (birth, residence, occupation, etc.) These, too, are saved as notepad files and entered by a batch copying.
My process for extracting Claims from census records is as follows:
- Open the spreadsheet for that census year
- Create a new sheet for the household to be analyzed
- Name the tab for the new sheet (Surname, First Name, MI)
- Copy the Template to the newly created sheet
- Open a copy of the respective census record and enter the data from it directly into the newly created sheet for that household
- Copy and paste the Claims from the spreadsheet into Evidentia’s Claims Catalogue page
- For each Claim copied from the spreadsheet into Evidentia, I also note the Claim Type(s) for each of the Claims (column 2 in the spreadsheet) and enter that information along with the Assigned Subject.
The Claims extracted from the 1930 US Census for my great Uncle Carl Mieding are as follows:
If you are interested in looking at the spreadsheet file (both the template and the sheet for Carl E. Mieding), you can download it (https://dl.dropboxusercontent.com/u/6904113/Claims%20Template%20-%201930%20US%20Census.xlsx) and open it with your spreadsheet software. That will give you access to all the formulas for computing the Claims. Note that many of the data fields in the Template are already filled with 0 (zero). Those formulas in those data fields are set to return a blank Claim if other data is not entered into the cell. You will also note that data entry for a number of the fields is related to a box below the entry cells.
The separate text list of formulas for Claim Extraction Templates is:
The separate text list of formulas for Claim Extraction Templates is found at (https://dl.dropboxusercontent.com/u/6904113/Census%20Claim%20Template%20Formulas%20(1930%20US%20Census).txt); the text list of Claim Types (assignment of Subjects and Types) is found at (https://dl.dropboxusercontent.com/u/6904113/!!!Census%20Claim-type%20Formulas%20(1930%20US%20Census).txt).
As I review and revise the rest of the US Census Claim Extraction Templates I will post them.
I am finding that the development of the formulas for each record set takes time. Once done, however, it is a simple matter of highlight, cut, and paste the resulting extracted Claims into Evidentia (along with the assignmet of Subjects and Types). Thanks to Paul Harris, Kevin Grooms, and all the members of the Google+ Evidentia Community who shared in the discussion.