Listing Member Nodes¶
The authoritative list of Member Nodes is provided by the /node
endpoint of the Coordinating Nodes. This notebook shows how to retrieve that information using the command lines tools curl
and xml starlet
.
The response from the /node
endpoint is an XML document with multiple node
elements, each containing details of a node. For example:
<node replicate="false" synchronize="false" type="mn" state="up">
<identifier>urn:node:PANGAEA</identifier>
<name>PANGAEA</name>
<description>Data publisher for Earth & Environmental Science</description>
<baseURL>https://pangaea-orc-1.dataone.org/mn</baseURL>
<services>
<service name="MNCore" version="v1" available="true"/>
<service name="MNRead" version="v1" available="true"/>
<service name="MNAuthorization" version="v1" available="true"/>
<service name="MNStorage" version="v1" available="true"/>
<service name="MNReplication" version="v1" available="true"/>
<service name="MNCore" version="v2" available="true"/>
<service name="MNRead" version="v2" available="true"/>
<service name="MNAuthorization" version="v2" available="true"/>
<service name="MNStorage" version="v2" available="true"/>
<service name="MNReplication" version="v2" available="true"/>
</services>
<synchronization>
<schedule hour="*" mday="*" min="11" mon="*" sec="0" wday="?" year="*"/>
<lastHarvested>2018-05-03T03:01:02.868+00:00</lastHarvested>
<lastCompleteHarvest>1900-01-01T00:00:00.000+00:00</lastCompleteHarvest>
</synchronization>
<subject>CN=urn:node:PANGAEA,DC=dataone,DC=org</subject>
<contactSubject>CN=M I A213106,O=Google,C=US,DC=cilogon,DC=org</contactSubject>
<property key="CN_node_name">PANGAEA Data Publisher for Earth and Environmental Science</property>
<property key="CN_operational_status">operational</property>
<property key="CN_logo_url">https://raw.githubusercontent.com/DataONEorg/member-node-info/master/production/graphics/web/PANGAEA.png</property>
<property key="CN_date_upcoming">2017-11-14T22:00:00</property>
<property key="CN_info_url">https://www.pangaea.de/</property>
<property key="CN_location_lonlat">8.8506,53.1101</property>
<property key="CN_date_operational">2018-03-20T17:46:00.000Z</property>
</node>
This information can be processed using an XML parser such as python’s ElementTree
to retrieve specific values of interest.
[38]:
import requests
import xml.etree.ElementTree as ET
from pprint import pprint
#The /node document endpoint
url = "https://cn.dataone.org/cn/v2/node"
node_document = requests.get(url).text
node_tree = ET.fromstring(node_document)
nodes = []
#Extract the node entry items of interest
for node in node_tree.iter("node"):
node_id = node.find("identifier").text
node_coords = node.find("property[@key='CN_location_lonlat']")
if not node_coords is None:
entry = {"node_id":node_id,
"name":node.find("name").text,
"type":node.get("type"),
"state":node.get("state"),
"status":node.find("property[@key='CN_operational_status']").text
}
node_coords = node_coords.text.split(",")
node_coords = list(map(float, node_coords))
# reverse coords since leaflet wants latitude first
entry["location"] = (node_coords[1], node_coords[0])
nodes.append( entry )
# Display the node list
for n in nodes:
print(f"{n['node_id']:20} {n['type']:3} {n['state']:4} {n['status']:14} {n['name']:40}")
urn:node:CNUNM1 cn up operational cn-unm-1
urn:node:CNUCSB1 cn up operational cn-ucsb-1
urn:node:CNORC1 cn up operational cn-orc-1
urn:node:KNB mn up operational KNB Data Repository
urn:node:ESA mn up operational ESA Data Registry
urn:node:SANPARKS mn up operational SANParks Data Repository
urn:node:ORNLDAAC mn down operational ORNL DAAC
urn:node:LTER mn up operational U.S. LTER Network
urn:node:CDL mn up operational UC3 Merritt
urn:node:PISCO mn up operational PISCO MN
urn:node:ONEShare mn up operational ONEShare DataONE Member Node
urn:node:mnORC1 mn up replicator DataONE ORC Dedicated Replica Server
urn:node:mnUNM1 mn up replicator DataONE UNM Dedicated Replica Server
urn:node:mnUCSB1 mn up replicator DataONE UCSB Dedicated Replica Server
urn:node:TFRI mn up operational TFRI Data Catalog
urn:node:USANPN mn down contributing USA National Phenology Network
urn:node:SEAD mn up operational SEAD Virtual Archive
urn:node:GOA mn up operational Gulf of Alaska Data Portal
urn:node:KUBI mn down operational University of Kansas - Biodiversity Institute
urn:node:LTER_EUROPE mn up operational LTER Europe Member Node
urn:node:DRYAD mn up operational Dryad Digital Repository
urn:node:CLOEBIRD mn up operational Cornell Lab of Ornithology - eBird
urn:node:EDACGSTORE mn up operational EDAC Gstore Repository
urn:node:IOE mn up operational Montana IoE Data Repository
urn:node:US_MPC mn up operational Minnesota Population Center
urn:node:EDORA mn down operational Environmental Data for the Oak Ridge Area (EDORA)
urn:node:RGD mn down operational Regional and Global biogeochemical dynamics Data (RGD)
urn:node:GLEON mn down contributing GLEON Data Repository
urn:node:IARC mn up operational IARC Data Archive
urn:node:NMEPSCOR mn up operational NM EPSCoR Tier 4 Node
urn:node:TERN mn up operational TERN Australia
urn:node:NKN mn up operational Northwest Knowledge Network
urn:node:USGS_SDC mn up operational USGS Science Data Catalog
urn:node:NRDC mn up operational NRDC DataONE member node
urn:node:NCEI mn up operational NOAA NCEI Oceanographic Data Archive
urn:node:PPBIO mn up operational PPBio
urn:node:NEON mn up operational NEON Member Node
urn:node:TDAR mn up operational The Digital Archaeological Record
urn:node:ARCTIC mn up operational Arctic Data Center
urn:node:BCODMO mn up operational Biological and Chemical Oceanography Data Management Office (BCO-DMO)
urn:node:GRIIDC mn up operational Gulf of Mexico Research Initiative Information and Data Cooperative (GRIIDC)
urn:node:R2R mn up operational Rolling Deck to Repository (R2R)
urn:node:EDI mn up operational Environmental Data Initiative
urn:node:UIC mn up operational A Member Node for University of Illinois at Chicago.
urn:node:RW mn up operational Research Workspace
urn:node:FEMC mn up operational Forest Ecosystem Monitoring Cooperative Member Node
urn:node:OTS_NDC mn up operational Organization for Tropical Studies - Neotropical Data Center
urn:node:PANGAEA mn up operational PANGAEA
urn:node:ESS_DIVE mn up operational ESS-DIVE: Deep Insight for Earth Science Data
urn:node:CAS_CERN mn up operational Chinese Ecosystem Research Network (CERN)
Now display the nodes on a map using the ipyleaflet
extension.
First group nodes that are close so they can be drawn with a marker cluster.
[39]:
def computeGroupCentroid(nodes):
sx = 0
sy = 0
for n in nodes:
sx += n["location"][1]
sy += n["location"][0]
return (sy/len(nodes), sx/len(nodes))
def computeDistance(a, b):
dx = (a[1]-b[1]) ** 2
dy = (a[0]-b[0]) ** 2
return (dx+dy) ** 0.5
#Initialize the groups with the first node.
#Each entry in the node_groups is a list of nodes that are close to the centroid of those nodes.
node_groups = [
[nodes[0],],
]
for node in nodes[1:]:
added = False
for gnodes in node_groups:
gc = computeGroupCentroid(gnodes)
dist = computeDistance(node["location"], gc)
if dist < 5.0:
gnodes.append(node)
added = True
if not added:
node_groups.append([node, ])
print(f"Grouped {len(nodes)} nodes to {len(node_groups)} groups")
Grouped 50 nodes to 24 groups
Now render the nodes using ipyleaflet
.
[40]:
from ipyleaflet import Map, Marker, CircleMarker, MarkerCluster
m = Map(center=(30, -40), zoom=2)
for ng in node_groups:
if len(ng) == 1:
node = ng[0]
marker = None
if node["type"] == "mn":
marker = Marker(location=node["location"], draggable=False, title=node["name"])
else:
marker = CircleMarker(location=node["location"])
m.add_layer(marker)
else:
markers = []
for node in ng:
marker = None
if node["type"] == "mn":
marker = Marker(location=node["location"], draggable=False, title=node["name"])
else:
marker = CircleMarker(location=node["location"])
markers.append(marker)
marker_cluster = MarkerCluster(markers=markers)
m.add_layer(marker_cluster)
m
[ ]: