Listing Member Nodes

The authoritative list of Member Nodes is provided by the /node endpoint of the Coordinating Nodes. This notebook shows how to retrieve that information using the command lines tools curl and xml starlet.

The response from the /node endpoint is an XML document with multiple node elements, each containing details of a node. For example:

<node replicate="false" synchronize="false" type="mn" state="up">
       <identifier>urn:node:PANGAEA</identifier>
       <name>PANGAEA</name>
       <description>Data publisher for Earth &amp; Environmental Science</description>
       <baseURL>https://pangaea-orc-1.dataone.org/mn</baseURL>
       <services>
           <service name="MNCore" version="v1" available="true"/>
           <service name="MNRead" version="v1" available="true"/>
           <service name="MNAuthorization" version="v1" available="true"/>
           <service name="MNStorage" version="v1" available="true"/>
           <service name="MNReplication" version="v1" available="true"/>
           <service name="MNCore" version="v2" available="true"/>
           <service name="MNRead" version="v2" available="true"/>
           <service name="MNAuthorization" version="v2" available="true"/>
           <service name="MNStorage" version="v2" available="true"/>
           <service name="MNReplication" version="v2" available="true"/>
       </services>
       <synchronization>
           <schedule hour="*" mday="*" min="11" mon="*" sec="0" wday="?" year="*"/>
           <lastHarvested>2018-05-03T03:01:02.868+00:00</lastHarvested>
           <lastCompleteHarvest>1900-01-01T00:00:00.000+00:00</lastCompleteHarvest>
       </synchronization>
       <subject>CN=urn:node:PANGAEA,DC=dataone,DC=org</subject>
       <contactSubject>CN=M I A213106,O=Google,C=US,DC=cilogon,DC=org</contactSubject>
       <property key="CN_node_name">PANGAEA Data Publisher for Earth and Environmental Science</property>
       <property key="CN_operational_status">operational</property>
       <property key="CN_logo_url">https://raw.githubusercontent.com/DataONEorg/member-node-info/master/production/graphics/web/PANGAEA.png</property>
       <property key="CN_date_upcoming">2017-11-14T22:00:00</property>
       <property key="CN_info_url">https://www.pangaea.de/</property>
       <property key="CN_location_lonlat">8.8506,53.1101</property>
       <property key="CN_date_operational">2018-03-20T17:46:00.000Z</property>
   </node>

This information can be processed using an XML parser such as python’s ElementTree to retrieve specific values of interest.

[38]:
import requests
import xml.etree.ElementTree as ET
from pprint import pprint

#The /node document endpoint
url = "https://cn.dataone.org/cn/v2/node"

node_document = requests.get(url).text
node_tree = ET.fromstring(node_document)
nodes = []

#Extract the node entry items of interest
for node in node_tree.iter("node"):
    node_id = node.find("identifier").text
    node_coords = node.find("property[@key='CN_location_lonlat']")
    if not node_coords is None:
        entry = {"node_id":node_id,
                 "name":node.find("name").text,
                 "type":node.get("type"),
                 "state":node.get("state"),
                 "status":node.find("property[@key='CN_operational_status']").text
                }
        node_coords = node_coords.text.split(",")
        node_coords = list(map(float, node_coords))
        # reverse coords since leaflet wants latitude first
        entry["location"] = (node_coords[1], node_coords[0])
        nodes.append( entry )

# Display the node list
for n in nodes:
    print(f"{n['node_id']:20} {n['type']:3} {n['state']:4} {n['status']:14} {n['name']:40}")

urn:node:CNUNM1      cn  up   operational    cn-unm-1
urn:node:CNUCSB1     cn  up   operational    cn-ucsb-1
urn:node:CNORC1      cn  up   operational    cn-orc-1
urn:node:KNB         mn  up   operational    KNB Data Repository
urn:node:ESA         mn  up   operational    ESA Data Registry
urn:node:SANPARKS    mn  up   operational    SANParks Data Repository
urn:node:ORNLDAAC    mn  down operational    ORNL DAAC
urn:node:LTER        mn  up   operational    U.S. LTER Network
urn:node:CDL         mn  up   operational    UC3 Merritt
urn:node:PISCO       mn  up   operational    PISCO MN
urn:node:ONEShare    mn  up   operational    ONEShare DataONE Member Node
urn:node:mnORC1      mn  up   replicator     DataONE ORC Dedicated Replica Server
urn:node:mnUNM1      mn  up   replicator     DataONE UNM Dedicated Replica Server
urn:node:mnUCSB1     mn  up   replicator     DataONE UCSB Dedicated Replica Server
urn:node:TFRI        mn  up   operational    TFRI Data Catalog
urn:node:USANPN      mn  down contributing   USA National Phenology Network
urn:node:SEAD        mn  up   operational    SEAD Virtual Archive
urn:node:GOA         mn  up   operational    Gulf of Alaska Data Portal
urn:node:KUBI        mn  down operational    University of Kansas - Biodiversity Institute
urn:node:LTER_EUROPE mn  up   operational    LTER Europe Member Node
urn:node:DRYAD       mn  up   operational    Dryad Digital Repository
urn:node:CLOEBIRD    mn  up   operational    Cornell Lab of Ornithology - eBird
urn:node:EDACGSTORE  mn  up   operational    EDAC Gstore Repository
urn:node:IOE         mn  up   operational    Montana IoE Data Repository
urn:node:US_MPC      mn  up   operational    Minnesota Population Center
urn:node:EDORA       mn  down operational    Environmental Data for the Oak Ridge Area (EDORA)
urn:node:RGD         mn  down operational    Regional and Global biogeochemical dynamics Data (RGD)
urn:node:GLEON       mn  down contributing   GLEON Data Repository
urn:node:IARC        mn  up   operational    IARC Data Archive
urn:node:NMEPSCOR    mn  up   operational    NM EPSCoR Tier 4 Node
urn:node:TERN        mn  up   operational    TERN Australia
urn:node:NKN         mn  up   operational    Northwest Knowledge Network
urn:node:USGS_SDC    mn  up   operational    USGS Science Data Catalog
urn:node:NRDC        mn  up   operational    NRDC DataONE member node
urn:node:NCEI        mn  up   operational    NOAA NCEI Oceanographic Data Archive
urn:node:PPBIO       mn  up   operational    PPBio
urn:node:NEON        mn  up   operational    NEON Member Node
urn:node:TDAR        mn  up   operational    The Digital Archaeological Record
urn:node:ARCTIC      mn  up   operational    Arctic Data Center
urn:node:BCODMO      mn  up   operational    Biological and Chemical Oceanography Data Management Office (BCO-DMO)
urn:node:GRIIDC      mn  up   operational    Gulf of Mexico Research Initiative Information and Data Cooperative (GRIIDC)
urn:node:R2R         mn  up   operational    Rolling Deck to Repository (R2R)
urn:node:EDI         mn  up   operational    Environmental Data Initiative
urn:node:UIC         mn  up   operational    A Member Node for University of Illinois at Chicago.
urn:node:RW          mn  up   operational    Research Workspace
urn:node:FEMC        mn  up   operational    Forest Ecosystem Monitoring Cooperative Member Node
urn:node:OTS_NDC     mn  up   operational    Organization for Tropical Studies - Neotropical Data Center
urn:node:PANGAEA     mn  up   operational    PANGAEA
urn:node:ESS_DIVE    mn  up   operational    ESS-DIVE: Deep Insight for Earth Science Data
urn:node:CAS_CERN    mn  up   operational    Chinese Ecosystem Research Network (CERN)

Now display the nodes on a map using the ipyleaflet extension.

First group nodes that are close so they can be drawn with a marker cluster.

[39]:
def computeGroupCentroid(nodes):
    sx = 0
    sy = 0
    for n in nodes:
        sx += n["location"][1]
        sy += n["location"][0]
    return (sy/len(nodes), sx/len(nodes))

def computeDistance(a, b):
    dx = (a[1]-b[1]) ** 2
    dy = (a[0]-b[0]) ** 2
    return (dx+dy) ** 0.5

#Initialize the groups with the first node.
#Each entry in the node_groups is a list of nodes that are close to the centroid of those nodes.
node_groups = [
                [nodes[0],],
              ]
for node in nodes[1:]:
    added = False
    for gnodes in node_groups:
        gc = computeGroupCentroid(gnodes)
        dist = computeDistance(node["location"], gc)
        if dist < 5.0:
            gnodes.append(node)
            added = True
    if not added:
        node_groups.append([node, ])
print(f"Grouped {len(nodes)} nodes to {len(node_groups)} groups")

Grouped 50 nodes to 24 groups

Now render the nodes using ipyleaflet.

[40]:
from ipyleaflet import Map, Marker, CircleMarker, MarkerCluster

m = Map(center=(30, -40), zoom=2)
for ng in node_groups:
    if len(ng) == 1:
        node = ng[0]
        marker = None
        if node["type"] == "mn":
            marker = Marker(location=node["location"], draggable=False, title=node["name"])
        else:
            marker = CircleMarker(location=node["location"])
        m.add_layer(marker)
    else:
        markers = []
        for node in ng:
            marker = None
            if node["type"] == "mn":
                marker = Marker(location=node["location"], draggable=False, title=node["name"])
            else:
                marker = CircleMarker(location=node["location"])
            markers.append(marker)
        marker_cluster = MarkerCluster(markers=markers)
        m.add_layer(marker_cluster)
m
[ ]: