A Proposal to serialize MARC in JSON

Note: to see the backstory and justification of this proposal, please see the preceding post.

MARC-in-JSON is a proposed JSON schema for representing MARC records as JSON. It is the outgrowth of working with MARC data in MongoDB and is intended to be both a faithful representation of MARC as well as a logical and useful model to work natively in JSON-centric environments. Ideally, this serialization could eventually replace binary MARC as the default format. The round trip of a MARC-in-JSON record from MARC to JSON back to MARC is lossless and preserves field/subfield order.

An example MARC bibliographic record, represented as text:

LEADER 01471cjm a2200349 a 4500
001 5674874
005 20030305110405.0
007 sdubsmennmplu
008 930331s1963 nyuppn eng d
035 $9 (DLC) 93707283
906 $a 7 $b cbc $c copycat $d 4 $e ncip $f 19 $g y-soundrec
010 $a 93707283
028 02 $a CS 8786 $b Columbia
035 $a (OCoLC)13083787
040 $a OClU $c DLC $d DLC
041 0 $d eng $g eng
042 $a lccopycat
050 00 $a Columbia CS 8786
100 1 $a Dylan, Bob, $d 1941-
245 14 $a The freewheelin' Bob Dylan $h [sound recording].
260 $a [New York, N.Y.] : $b Columbia, $c [1963]
300 $a 1 sound disc : $b analog, 33 1/3 rpm, stereo. ; $c 12 in.
500 $a Songs.
511 0 $a The composer accompanying himself on the guitar ; in part with instrumental ensemble.
500 $a Program notes by Nat Hentoff on container.
505 0 $a Blowin' in the wind -- Girl from the north country -- Masters of war -- Down the highway -- Bob Dylan's blues -- A hard rain's a-gonna fall -- Don't think twice, it's all right -- Bob Dylan's dream -- Oxford town -- Talking World War III blues -- Corrina, Corrina -- Honey, just allow me one more chance -- I shall be free.
650 0 $a Popular music $y 1961-1970.
650 0 $a Blues (Music) $y 1961-1970.
856 41 $3 Preservation copy (limited access) $u http://hdl.loc.gov/loc.mbrsrs/lp0001.dyln
952 $a New
953 $a TA28
991 $b c-RecSound $h Columbia CS 8786 $w MUSIC

The same bibliographic record serialized as MARC-in-JSON would appear as follows (pretty-printed with whitespace and line breaks for readability):

{
    "leader":"01471cjm a2200349 a 4500",
    "fields":
    [
        {
            "001":"5674874"
        },
        {
            "005":"20030305110405.0"
        },
        {
            "007":"sdubsmennmplu"
        },
        {
            "008":"930331s1963    nyuppn              eng d"
        },
        {
            "035":
            {
                "subfields":
                [
                    {
                        "9":"(DLC)   93707283"
                    }
                ],
                "ind1":" ",
                "ind2":" "
            }
        },
        {
            "906":
            {
                "subfields":
                [
                    {
                        "a":"7"
                    },
                    {
                        "b":"cbc"
                    },
                    {
                        "c":"copycat"
                    },
                    {
                        "d":"4"
                    },
                    {
                        "e":"ncip"
                    },
                    {
                        "f":"19"
                    },
                    {
                        "g":"y-soundrec"
                    }
                ],
                "ind1":" ",
                "ind2":" "
            }
        },
        {
            "010":
            {
                "subfields":
                [
                    {
                        "a":"   93707283 "
                    }
                ],
                "ind1":" ",
                "ind2":" "
            }
        },
        {
            "028":
            {
                "subfields":
                [
                    {
                        "a":"CS 8786"
                    },
                    {
                        "b":"Columbia"
                    }
                ],
                "ind1":"0",
                "ind2":"2"
            }
        },
        {
            "035":
            {
                "subfields":
                [
                    {
                        "a":"(OCoLC)13083787"
                    }
                ],
                "ind1":" ",
                "ind2":" "
            }
        },
        {
            "040":
            {
                "subfields":
                [
                    {
                        "a":"OClU"
                    },
                    {
                        "c":"DLC"
                    },
                    {
                        "d":"DLC"
                    }
                ],
                "ind1":" ",
                "ind2":" "
            }
        },
        {
            "041":
            {
                "subfields":
                [
                    {
                        "d":"eng"
                    },
                    {
                        "g":"eng"
                    }
                ],
                "ind1":"0",
                "ind2":" "
            }
        },
        {
            "042":
            {
                "subfields":
                [
                    {
                        "a":"lccopycat"
                    }
                ],
                "ind1":" ",
                "ind2":" "
            }
        },
        {
            "050":
            {
                "subfields":
                [
                    {
                        "a":"Columbia CS 8786"
                    }
                ],
                "ind1":"0",
                "ind2":"0"
            }
        },
        {
            "100":
            {
                "subfields":
                [
                    {
                        "a":"Dylan,
                         Bob,
                        "
                    },
                    {
                        "d":"1941-"
                    }
                ],
                "ind1":"1",
                "ind2":" "
            }
        },
        {
            "245":
            {
                "subfields":
                [
                    {
                        "a":"The freewheelin' Bob Dylan"
                    },
                    {
                        "h":"
                        [
                            sound recording
                        ]
                        ."
                    }
                ],
                "ind1":"1",
                "ind2":"4"
            }
        },
        {
            "260":
            {
                "subfields":
                [
                    {
                        "a":"
                        [
                            New York,
                             N.Y.
                        ]
                         :"
                    },
                    {
                        "b":"Columbia,
                        "
                    },
                    {
                        "c":"
                        [
                            1963
                        ]
                        "
                    }
                ],
                "ind1":" ",
                "ind2":" "
            }
        },
        {
            "300":
            {
                "subfields":
                [
                    {
                        "a":"1 sound disc :"
                    },
                    {
                        "b":"analog,
                         33 1/3 rpm,
                         stereo. ;"
                    },
                    {
                        "c":"12 in."
                    }
                ],
                "ind1":" ",
                "ind2":" "
            }
        },
        {
            "500":
            {
                "subfields":
                [
                    {
                        "a":"Songs."
                    }
                ],
                "ind1":" ",
                "ind2":" "
            }
        },
        {
            "511":
            {
                "subfields":
                [
                    {
                        "a":"The composer accompanying himself on the guitar ; in part with instrumental ensemble."
                    }
                ],
                "ind1":"0",
                "ind2":" "
            }
        },
        {
            "500":
            {
                "subfields":
                [
                    {
                        "a":"Program notes by Nat Hentoff on container."
                    }
                ],
                "ind1":" ",
                "ind2":" "
            }
        },
        {
            "505":
            {
                "subfields":
                [
                    {
                        "a":"Blowin' in the wind -- Girl from the north country -- Masters of war -- Down the highway -- Bob Dylan's blues -- A hard rain's a-gonna fall -- Don't think twice,
                         it's all right -- Bob Dylan's dream -- Oxford town -- Talking World War III blues -- Corrina,
                         Corrina -- Honey,
                         just allow me one more chance -- I shall be free."
                    }
                ],
                "ind1":"0",
                "ind2":" "
            }
        },
        {
            "650":
            {
                "subfields":
                [
                    {
                        "a":"Popular music"
                    },
                    {
                        "y":"1961-1970."
                    }
                ],
                "ind1":" ",
                "ind2":"0"
            }
        },
        {
            "650":
            {
                "subfields":
                [
                    {
                        "a":"Blues (Music)"
                    },
                    {
                        "y":"1961-1970."
                    }
                ],
                "ind1":" ",
                "ind2":"0"
            }
        },
        {
            "856":
            {
                "subfields":
                [
                    {
                        "3":"Preservation copy (limited access)"
                    },
                    {
                        "u":"http://hdl.loc.gov/loc.mbrsrs/lp0001.dyln"
                    }
                ],
                "ind1":"4",
                "ind2":"1"
            }
        },
        {
            "952":
            {
                "subfields":
                [
                    {
                        "a":"New"
                    }
                ],
                "ind1":" ",
                "ind2":" "
            }
        },
        {
            "953":
            {
                "subfields":
                [
                    {
                        "a":"TA28"
                    }
                ],
                "ind1":" ",
                "ind2":" "
            }
        },
        {
            "991":
            {
                "subfields":
                [
                    {
                        "b":"c-RecSound"
                    },
                    {
                        "h":"Columbia CS 8786"
                    },
                    {
                        "w":"MUSIC"
                    }
                ],
                "ind1":" ",
                "ind2":" "
            }
        }
    ]
}

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in IETF RFC 2119.

MARC-in-JSON records MUST conform to the following JSON schema:

{
    "description":"A MARC Record",
    "type": "object",
    "properties": {
        "leader": {
            "type": "string",
            "minLength": 24,
            "maxLength": 24
        },
        "fields": {
            "type": "array",
            "items": {
                "type":[
                    {
                        "type": "object",
                        "description":"A MARC Control Field",
                        "additionalProperties":{
                            "type":"string"
                        }
                    },
                    {
                        "type": "object",
                        "additionalProperties":{
                            "type":"object",
                            "description":"A MARC Variable Field",
                            "properties":{
                                "ind1":{
                                    "type":"string",
                                    "minLength":1,
                                    "maxLength":1
                                },
                                "ind2":{
                                    "type":"string",
                                    "minLength":1,
                                    "maxLength":1
                                },
                                "subfields":{
                                    "type":"array",
                                    "items":{
                                        "type":"object",
                                        "description":"A MARC Subfield",
                                        "additionalProperties":{
                                            "type":"string"
                                        }
                                    }
                                }
                            }
                        }
                    }
                    ]
                },
            "additionalProperties": false
        }
    },
    "additionalProperties": false
}

Download this schema.

MARC-in-JSON consists of four (4) object types:

Record objects
The base representation of the MARC record. It MUST be a JSON object with two properties:

  • leader, which MUST be a string, exactly 24 characters in length.
  • fields, an array which MUST only contain control field and variable field objects.

Record objects MAY be contained in a JSON array.

Control field objects
MARC control fields MUST be represented as a JSON object with a single key/value pair. The key MUST be a string conforming to a valid MARC field tag value (generally three alphanumeric characters). The value of the object MUST be a string.
Variable field objects
Variable fields MUST be represented as JSON objects with a single key/value pair. The key MUST be a string conforming to a valid MARC field tag value (generally three alphanumeric characters). The value of the object MUST be a JSON object with three properties:

  • ind1: a one (1) character string representing the 1st MARC field indicator
  • ind2: a one (1) character string representing the 2nd MARC field indicator
  • subfields: an array containing at least one subfield object
Subfield objects
MARC subfields MUST be represented as JSON objects with a single key/value pair. The key MUST be a string conforming to a valid MARC subfield code value (generally a single alphanumeric character). The value MUST be a string representing the value of the subfield. A subfield object MUST only appear in a variable field object subfields array.

The content of a MARC-in-JSON object MUST be UTF-8 encoded or UTF-8 escaped according to the JSON standard (RFC 4627).  MARC-8, UTF-16 or UTF-32 SHALL NOT be permitted under MARC-in-JSON.

There are currently two implementations conforming to this specification for serialization:

12 comments
  1. Should we think about transforming the leader? Record length is meaningless in this context — maybe make it all zeros?

    Also, the spec requires utf-8, so maybe the encoding char in the leader should be forced to ‘a’ within marc-in-json.

  2. Also — what to do about serializing multiple records in a collection? Requiring a json pull-parser might be more a challenge in some language than others.

    One (perfectly valid) option is to ignore it, and explicitly state that this is a single-record serialization.

  3. Ross said:

    I have no problem with the record length suggestion. What does marcxml do? I figure copying its behavior makes the most sense.

    I’m not sure, however, with forcing the encoding char to ‘a’. Just because JSON requires UTF-8, it doesn’t mean the record wasn’t inadvertently serialized from a MARC-8 record. At least a valid leader would signal some ‘buyer beware’ actions need to be taken on the part of consumer.

    As far as single vs. multiple objects in the collection, I say this work just like marcxml: if the first character is a “[” it’s a collection, if it’s a “{“, it’s a single record. It’s hard to justify newline delimited JSON until there is some standardized way to advertise it.

    I agree that requiring a pull-parser is less than ideal (and it wouldn’t be ‘required’, just ‘recommended’), but given a selection of sub-par alternatives (pull-parser vs. non-standard), I feel compelled to go with the thing we can advertise and consistently document.

    Now, if ways to provide newline-delimited JSON were to improve, then I’m all for it. Also, this says nothing about any out of band arrangements you might have.

  4. Robin said:

    The sample here changes the semantics of the MARC a little bit, though I suspect that it’s just the source formatting doing it.

    E.g. the $a on:

    260 $a [New York, N.Y.] : $b Columbia, $c [1963]

    becomes:

    “a”:”
    [
    New York,
    N.Y.
    ]
    :”

    introducing newlines where they shouldn’t be (not that they should be in MARC anyway, but that doesn’t mean they don’t happen sometimes…)

  5. I’m very pleased to uncover this page. I
    wanted to thank you for ones time for this particularly wonderful read!!

    I definitely loved every little bit of it and I have you saved
    as a favorite to look at new information on your web site.

  6. Among these three methods of earning money online, the most strenuous is to develop an area of knowledge by yourself, and then offer that know-how for purchase on the internet.
    So, if you are not able to attend a webinar, you can go to the library, click on the webinar, and actually see
    and hear the same webinar that you missed.
    Earn With Adsense There is a large potential to earn extra income
    using Google’s adsense program.

  7. Howdy! I understand this is kind of off-topic however I had to ask.
    Does building a well-established blog such as yours require a large amount of work?
    I am completely new to running a blog but I do write in
    my diary everyday. I’d like to start a blog so I can share my
    experience and thoughts online. Please let me know if you have any suggestions or tips for
    brand new aspiring bloggers. Thankyou!

  8. These are in fact fantastic ideas in concerning blogging.
    You have touched some good things here. Any way keep up wrinting.

  9. But it is not possible for them to do all these by themselves given the limited hours
    that they have in a day. On the IIU website there’s a
    extensive listing of affiliate marketer sites, all of which are extremely effective.

    1 million, of $15 million was received in Series B funding from venture capital firm Andreessen Horowitz and
    other investors.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>