How to check total allotted space inside a HDFS 'group'

All we need is an easy explanation of the problem, so here it is.

Our DBA has created a schema for our team in HDFS/HIVE. Not sure if ‘schema’ is the right word, they call it a ‘group’.
Anyway, we can only write to the data lake inside this schema, whether it is parquet files or hive tables.
Is there a way to check what is the max space allocated to our group , knowing only the schema name?
I dont want to accidentally load too much data.

Thank you.

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

It is not possible to set space quotas on Hive level only because Hive is quite detached from HDFS storage. In Hive you can create tables not only in the hive.metastore.warehouse.dir but also can have external tables in other directories in HDFS. External tables can be being loaded by not only hive, even Hive-managed tables can be loaded by some other tools without Hive(for example you can put files manually into table directory, does not matter Managed or External). Also you can use HDFS not only for Hive. So, it is not possible to set space quotas in Hive and even if it was possible, it is not practical, Hive is not aware what else you are doing in HDFS.

Read about HDFS space quotas. It is possible to set space quotas for files inside directory. Also it is possible to set permissions for users (groups) allowing them to access some directories. HDFS directory ACL + directory space quota can be combined to restrict users or groups to use only allowed directories (whith space quotas).

You can check directory quota using

hadoop fs -count -q /path/to/directory

To check directory ACL use this command:

hdfs dfs -getfacl [-R] <path>

See more in FS shell commands guide.

Read also about Hive Authorization Options

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply