๐ฉ๐ค๐๐ฒ๐๐๐ถ๐ผ๐ป
๐ ๐๐ฒ๐ฒ๐ฝ ๐๐ถ๐๐ฒ ๐ถ๐ป๐๐ผ ๐ค๐๐ฒ๐ฟ๐ ๐๐ ๐ฒ๐ฐ๐๐๐ถ๐ผ๐ป ๐๐ป๐ด๐ถ๐ป๐ฒ ๐ผ๐ณ ๐ฆ๐ฝ๐ฎ๐ฟ๐ธ ๐ฆ๐ค๐
๐๐จ๐ป๐ฟ๐ฒ๐๐ผ๐น๐๐ฒ๐ฑ ๐๐ผ๐ด๐ถ๐ฐ๐ฎ๐น ๐ฃ๐น๐ฎ๐ป
๐๐ปSyntactic (syntax check)
๐๐ปSymantic verification (object should be proven)
๐๐ปParsing activity should be checked.
๐๐ฆ๐ฐ๐ต๐ฒ๐บ๐ฎ/๐ ๐ฒ๐๐ฎ๐ฑ๐ฎ๐๐ฎ ๐๐ฎ๐๐ฎ๐น๐ผ๐ด๐๐ฒ
๐๐ปDatatype, schema details should be taken.
๐๐๐ผ๐ด๐ถ๐ฐ๐ฎ๐น ๐ฃ๐น๐ฎ๐ป
๐๐ปCatalyst optimizer written in Scala in the form of a tree.
๐๐ปEach tree will contain nodes. Each node has a child node. Each node contains rules based on the form of a tree.
๐๐ปEach node has a role-based optimization there.
๐๐ปNumber of logical plans to be executed
๐๐ข๐ฝ๐๐ถ๐บ๐ถ๐๐ฒ๐ฑ ๐๐ผ๐ด๐ถ๐ฐ๐ฎ๐น ๐ฃ๐น๐ฎ๐ป
๐๐ปRelated activities to be grouped as a micro batch
๐๐ปPredicate pushdown
๐๐ปProjection pushdown
๐๐ปRearrange the filter
๐๐ปConversion of decimal operations to integer operations
๐๐ปReplacement of some regex expressions by Java's methods
๐๐ปIf-else clause simplification
๐๐ฃ๐ต๐๐๐ถ๐ฐ๐ฎ๐น ๐ฃ๐น๐ฎ๐ป๐
๐๐ปOptimizer constructs multiple physical plans from an optimized logical plan.
๐๐ปA physical plan defines how data will be computed.
๐๐ปThe plans are also optimized.
๐๐ปThe optimization can combine/merge different filters, sending predicates pushdown directly to data source to eliminate some data at data source level.
๐๐ฆ๐ฒ๐น๐ฒ๐ฐ๐๐ฒ๐ฑ ๐ฃ๐ต๐๐๐ถ๐ฐ๐ฎ๐น ๐ฃ๐น๐ฎ๐ป๐
๐๐ปOptimizer determines which physical plan has the lowest cost of execution and ๐๐ปchooses that plan for computation.
๐๐ปCost is a concept or ๐บ๐ฒ๐๐ฟ๐ถ๐ฐ ๐๐๐ฒ๐ฑ to ๐ฒ๐๐๐ถ๐บ๐ฎ๐๐ฒ ๐๐ต๐ฒ ๐ฐ๐ผ๐๐ ๐ผ๐ณ ๐๐ต๐ฒ ๐ฝ๐น๐ฎ๐ป๐.
๐๐๐ ๐ฒ๐ฐ๐๐๐ถ๐๐ฒ ๐ก๐ฎ๐๐ถ๐๐ฒ ๐ฅ๐๐
๐๐ปOptimizer generates Java bytecode for the best physical plan. The generation is made possible thanks to Scala's feature called "๐พ๐๐ฎ๐๐ถ๐พ๐๐ผ๐๐ฒ๐".
๐๐ปThis step is optimized by code-based optimization.
๐๐ปCatalyst with Special Feature of Scala Language - Quasiquotes
๐๐ปMakes code generation easier.
๐๐ป๐ค๐๐ฎ๐๐ถ๐พ๐๐ผ๐๐ฒ๐ let the programmatic construction of abstract syntax trees (ASTs) in the scala language which can be fed into scala compiler at runtime to generate code.
#dataengineer
#Pyspark
#Pysparkinterview
#Bigdata
#BigDataengineer
#dataanalytics
#data
#interview
#sparkdeveloper
#sparkbyexample
#pandas
No comments:
Post a Comment