Skip to Main Content

Collection Analysis


On the Journal Data sub-pages, we will provide information about our data sources. It can be difficult to match the data across these sources because of the weakness of the International Standard Serial Number (ISSN) as a matchpoint, so we'll explain how we combine data and validate the results below.


How to combine journals data -- the basics:

The following video was prepared for the SUNYLA Conference in 2023. I move through basic data matching quickly because the timeslot was only about 20 minutes:

Mankato has iteratively developed a process to normalize journal data across numerous data sources, but the current iteration of the process can only normalize data for one "keylist" at a time. Data normalization is enormously important because it provides the basis for efficient and accurate report production. Normalized data can be re-used across many different kinds of reports. In addition, because we rely on data from many data sources, we can add or revise data in our reports in a "modular" fashion, because any new data can be normalized based on the keylist relatively efficiently.

We can use any journal list as a keylist. We have standardized reports based on (1) the full ScImago list of journals and (2) the full list of our subscription journals (either individual or package subscriptions). The first provides the basis for the CPBI and the second provides the basis for the Collection Review report, which also in turn is summarized in the Package Level Analysis report. For accreditation visits, we also often create customized reports based on subject indexes or other journal lists.


I should note that we have explored more ambitious data normalization. Instead of normalizing data based on a single keylist at a time, we could, in theory, normalize data all at once from and to all data sources, but we haven't had the time (and I can't imagine when we'll have the time) to develop a sustainable method to pursue this more ambitious approach. None of the members of the Mankato CMT team work full-time on collection analysis and there is no programmer support. Of the staff involved, here is an estimate of scheduled work-time devoted to collection analysis: librarian 1 (project manager): 20%, librarians 2 & 3: 2-3%, technician: 15%. Other librarians may get involved for specific projects. Librarians 1 & 2 most often give additional time to this project in their off-hours, because they feel the work is important to support system, university, and library strategic priorities with meaningful evidence. Librarian 1 recently increased the scheduled work-time devoted to this project from about 10% to about 20%, because of library priorities, including support for a massive general collection reduction plan, a custom report request to support instruction, preparation of the bi-annual Collection Review, and to prepare the CPBI for its production launch. There aren't normally quite so many reporting projects all at once.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License