We're interested in a wide variety of data describing library materials and related objects as well as data capturing community use and evaluation of these objects. Have a look at Data potentially available to LibraryCloud for a sampling of data types we would encourage you to consider.
However, in this initial phase of gathering and evaluating data, we're currently primarily interested in item data (books and other circulating library materials) and event data (circulation and user-classification info). This could come to us in at least 2 forms:
- Separate item and event datasets with a shared linking ID (so that for each event record in the event dataset we can tell which item is being reported on in the event dataset)
- Merged dataset, in which each record consists of a documented event (for example, a check-out) supplemented with basic item data (at least author, title, publication date and some standard book ID -- ISBN and/or OCLC and/or LCCN, etc.)
- Regarding item data, we'd be more than happy to receive the full bibliographic records for each item. If, for example, you simply wanted to dump the full, raw Marc records, that would be ideal. We often come across incomplete bib records in our current database, and if we have another library's fully cataloged record for the same item, we can merge the data. Also, the chances of our being able to determine if 2 different libraries' records are about the same book generally go up the richer the information in each record.
- If the event data uses any non-self-explanatory codes (for example, for patron type), then it would be very helpful to receive the look-up tables explaining these.
- In general, if the dataset structure's field names are not self-evident, then a short accompanying note as to what the various fields indicate would be much appreciated.
But we're certainly also interested in other kinds of datasets you may have -- see Data potentially available to LibraryCloud for an initial scoping of the very wide range of data LibraryCloud is interested in. When sending these other datasets, please include a short description of what the dataset consists of and its structure.
FTP Account to Use
After connecting, it's necessary to go to the "incoming" directory and deposit files in that directory -- that is the only space where the "lil_dropbox" account has write permissions.
Also, note that the "lil_dropbox" account is only for uploading files -- it does not allow deletions, overwrites or modifications to files once they have been uploaded.
After files have been uploaded by you, we will transfer them to our data server and then delete them from the ftp dropbox.
Since we have only one account for everyone to use, and all the uploaded files will be in the same directory, it would be very helpful if we could keep things organized by following the file-naming format below, with the listed components, in sequence:
- Institution name or other data-depositor identifier
- Type of data (can be whatever short descriptor you find apt)
- Date, in the format: yyyymmdd
- If multiple files: part number
- File extension (corresponding to file format): e.g., csv, json, xml, mrc , sql, etc.
Separate components should be separated by underscores.
- Data upload consists of 2 files: harvard_item_data_20110525_1.csv and harvard_item_data_20110525_2.csv
- Data upload consists of 1 file: northeastern_event_data_20110525.json or boston_public_library_circulation_data_20110525.xml
Feel free to send along your data in any format you find convenient:
- MARC (feel free to send us your raw MARC for item data)
- Tab-delimited CSV
- Any other clearly structured format
We're always more than willing to talk with you about format and content and anything related to your transferring data to us, so don't hesitate to get in touch with us at lil at law.harvard.edu
If you have any suggestions about how we can make this process easier for you, please pass them along.