Hadoop 元件具有機架感知功能。例如,HDFS 區塊配置會使用機架感知功能來容錯,方法是在不同的機架上放置一個區塊複本。在群集內發生網路交換器故障或分割時,這可提供資料可用性。
Hadoop 主控守護程式會透過呼叫外部指令碼或 Java 類別(如組態檔所指定)來取得群集工作站的機架 ID。使用 Java 類別或外部指令碼進行拓撲時,輸出必須遵守 Java org.apache.hadoop.net.DNSToSwitchMapping 介面。介面預期會維持一對一的對應關係,且拓撲資訊的格式為「/myrack/myhost」,其中「/」是拓撲分隔符號,「myrack」是機架識別碼,而「myhost」是個別主機。假設每個機架只有一個 /24 子網路,則可以使用「/192.168.100.0/192.168.100.5」格式作為唯一的機架主機拓撲對應。
若要使用 Java 類別進行拓撲對應,類別名稱會由組態檔中的 net.topology.node.switch.mapping.impl 參數指定。Hadoop 發行版附有範例 NetworkTopology.java,Hadoop 管理員可以自訂此範例。使用 Java 類別而非外部指令碼具有效能優勢,因為當新的工作站節點註冊時,Hadoop 不需要分叉外部程序。
如果實作外部指令碼,它將在設定檔中以 net.topology.script.file.name 參數指定。與 Java 類別不同,外部拓撲指令碼未包含在 Hadoop 發行版中,而是由管理員提供。Hadoop 會在分岔拓撲指令碼時將多個 IP 位址傳送至 ARGV。傳送至拓撲指令碼的 IP 位址數量由 net.topology.script.number.args 控制,預設值為 100。如果 net.topology.script.number.args 變更為 1,則會為 DataNodes 和/或 NodeManagers 提交的每個 IP 分岔一個拓撲指令碼。
如果未設定 net.topology.script.file.name 或 net.topology.node.switch.mapping.impl,則會為任何傳遞的 IP 位址傳回機架 ID '/default-rack'。雖然此行為看似理想,但它可能會導致 HDFS 區塊複製問題,因為預設行為是寫入機架外的一個複製區塊,但無法執行此動作,因為只有一個機架名稱為 '/default-rack'。
#!/usr/bin/python3 # this script makes assumptions about the physical environment. # 1) each rack is its own layer 3 network with a /24 subnet, which # could be typical where each rack has its own # switch with uplinks to a central core router. # # +-----------+ # |core router| # +-----------+ # / \ # +-----------+ +-----------+ # |rack switch| |rack switch| # +-----------+ +-----------+ # | data node | | data node | # +-----------+ +-----------+ # | data node | | data node | # +-----------+ +-----------+ # # 2) topology script gets list of IP's as input, calculates network address, and prints '/network_address/ip'. import netaddr import sys sys.argv.pop(0) # discard name of topology script from argv list as we just want IP addresses netmask = '255.255.255.0' # set netmask to what's being used in your environment. The example uses a /24 for ip in sys.argv: # loop over list of datanode IP's address = '{0}/{1}'.format(ip, netmask) # format address string so it looks like 'ip/netmask' to make netaddr work try: network_address = netaddr.IPNetwork(address).network # calculate and print network address print("/{0}".format(network_address)) except: print("/rack-unknown") # print catch-all value if unable to calculate network address
#!/usr/bin/env bash # Here's a bash example to show just how simple these scripts can be # Assuming we have flat network with everything on a single switch, we can fake a rack topology. # This could occur in a lab environment where we have limited nodes,like 2-8 physical machines on a unmanaged switch. # This may also apply to multiple virtual machines running on the same physical hardware. # The number of machines isn't important, but that we are trying to fake a network topology when there isn't one. # # +----------+ +--------+ # |jobtracker| |datanode| # +----------+ +--------+ # \ / # +--------+ +--------+ +--------+ # |datanode|--| switch |--|datanode| # +--------+ +--------+ +--------+ # / \ # +--------+ +--------+ # |datanode| |namenode| # +--------+ +--------+ # # With this network topology, we are treating each host as a rack. This is being done by taking the last octet # in the datanode's IP and prepending it with the word '/rack-'. The advantage for doing this is so HDFS # can create its 'off-rack' block copy. # 1) 'echo $@' will echo all ARGV values to xargs. # 2) 'xargs' will enforce that we print a single argv value per line # 3) 'awk' will split fields on dots and append the last field to the string '/rack-'. If awk # fails to split on four dots, it will still print '/rack-' last field value echo $@ | xargs -n 1 | awk -F '.' '{print "/rack-"$NF}'